Flexible high-dimensional classification machines and their asymptotic properties
نویسندگان
چکیده
Classification is an important topic in statistics and machine learning with great potential in many real applications. In this paper, we investigate two popular large-margin classification methods, Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD), under two contexts: the high-dimensional, low-sample size data and the imbalanced data. A unified family of classification machines, the FLexible Assortment MachinE (FLAME) is proposed, within which DWD and SVM are special cases. The FLAME family helps to identify the similarities and differences between SVM and DWD. It is well known that many classifiers overfit the data in the high-dimensional setting; and others are sensitive to the imbalanced data, that is, the class with a larger sample size overly influences the classifier and pushes the decision boundary towards the minority class. SVM is resistant to the imbalanced data issue, but it overfits high-dimensional data sets by showing the undesired data-piling phenomenon. The DWD method was proposed to improve SVM in the highdimensional setting, but its decision boundary is sensitive to the imbalanced ratio of sample sizes. Our FLAME family helps to understand an intrinsic connection between SVM and DWD, and provides a trade-off between sensitivity to the imbalanced data and overfitting the high-dimensional data. Several asymptotic properties of the FLAME classifiers are studied. Simulations and real data applications are investigated to illustrate theoretical findings.
منابع مشابه
Asymptotic Distributions of Estimators of Eigenvalues and Eigenfunctions in Functional Data
Functional data analysis is a relatively new and rapidly growing area of statistics. This is partly due to technological advancements which have made it possible to generate new types of data that are in the form of curves. Because the data are functions, they lie in function spaces, which are of infinite dimension. To analyse functional data, one way, which is widely used, is to employ princip...
متن کاملA fixed and flexible maintenance operations planning optimization in a parallel batch machines manufacturing system
Scheduling has become an attractive area for artificial intelligence researchers. On other hand, in today's real-world manufacturing systems, the importance of an efficient maintenance schedule program cannot be ignored because it plays an important role in the success of manufacturing facilities. A maintenance program may be considered as the heath care of manufacturing machines and equipments...
متن کاملOptimizing Flexible Manufacturing System: A Developed Computer Simulation Model
In recent years, flexible manufacturing system as a response to market demands has been proposed to increase product diversity, optimum utilization of machines andperiods of short-term products.The development of computer systems has provided the ability to build machines with high functionality and the necessary flexibility to perform various operations. However, due to the complexity and the ...
متن کاملتعیین ماشینهای بردار پشتیبان بهینه در طبقهبندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک
Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which s...
متن کاملPenalized gaussian process regression and classification for high-dimensional nonlinear data.
The model based on Gaussian process (GP) prior and a kernel covariance function can be used to fit nonlinear data with multidimensional covariates. It has been used as a flexible nonparametric approach for curve fitting, classification, clustering, and other statistical problems, and has been widely applied to deal with complex nonlinear systems in many different areas particularly in machine l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 16 شماره
صفحات -
تاریخ انتشار 2015